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We consider the marginal models of Liang and Zeger [Biometrika 
73 (1986) 13-22] for the analysis of longitudinal data and we develop 
a theory of statistical inference for such models. We prove the exis- 
tence, weak consistency and asymptotic normality of a sequence of 
f"H , estimators defined as roots of pseudo-likelihood equations. 

CO 

1. Introduction. Longitudinal data sets arise in biostatistics and life- 
time testing problems when the responses of the individuals are recorded 
repeatedly over a period of time. By controlling for individual differences, 
longitudinal studies are well-suited to measure change over time. On the 
other hand, they require the use of special statistical techniques because the 
' responses on the same individual tend to be strongly correlated. In a seminal 

. paper Liang and Zeger (1986) proposed the use of generalized linear models 

(GLM) for the analysis of longitudinal data. 
■ In a cross-sectional study, a GLM is used when there are reasons to believe 

that each response yi depends on an observable vector Xj of covariates [see 
| the monograph of McCullagh and Nelder (1989)]. Typically this dependence 

is specified by an unknown parameter (5 and a link function [i via the rela- 
^ | tionship ^{(3) = ft(x[/3), where /ii(/3) is the mean of yi. For one-dimensional 

observations, the maximum quasi- likelihood estimator j3 n is defined as the 
Ch ■ solution of the equation 

> : 

"O: C 1 ) ^/i i (/5K(/3)- 1 (y,- W (/3))=0, 

where /i, is the derivative of //j and Vi(f3) is the variance of yi. Note that this 
equation simplifies considerably if we assume that Vi(/3) = <pifi(x[ /3), with a 



Received May 2003; revised March 2004. 

1 Supported by the Natural Sciences and Engineering Research Council of Canada. 
AMS 2000 subject classifications. Primary 62F12; secondary 62J12. 
Key words and phrases. Generalized estimating equations, generalized linear model, 
consistency, asymptotic normality. 



This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2005, Vol. 33, No. 2, 522-541. This reprint differs from the original in pagination 

and typographic detail. 

1 



2 



R. M. BALAN AND I. SCHIOPU-KRATINA 



nuisance scale parameter fa. In fact (1) is a genuine likelihood equation if 
the yiS are independent with densities c(yi, fa) expj^" 1 [(xf (3)yi — b(xf/3)]}. 

In a longitudinal study, the components of an observation yj = (yn , . . . , y« m ) T 
represent repeated measurements at times l,...,m for subject i. The ap- 
proach proposed by Liang and Zeger is to impose the usual assumptions of 
a GLM only for the marginal scalar observations ytj and the p-dimensional 
design vectors Xjj. If the correlation matrices within individuals are known 
(but the entire likelihood is not specified), then the m-dimensional version 
of (1) becomes a generalized estimating equation (GEE). 

In this article we prove the existence, weak consistency and asymptotic 
normality of a sequence of estimators, defined as solutions (roots) of pseudo- 
likelihood equations [see Shao (1999), page 315]. We work within a non- 
parametric set-up similar to that of Liang and Zeger and build upon the 
impressive work of Xie and Yang (2003). 

Our approach differs from that of Liang and Zeger (1986), Xie and Yang 
(2003) and Schiopu-Kratina (2003) in the treatment of the correlation struc- 
ture of the data recorded for the same individual across time. As in Rao 
(1998), we first obtain a sequence of preliminary consistent estimators ($ n ) n 
of the main parameter (5q (under the "working independence assumption" ) , 
which we use to consistently estimate the average of the true individual cor- 
relations. We then create the pseudo-likelihood equations whose solutions 
provide our final sequence of consistent estimators of the main parameter. 
In practice, the analyst would first use numerical approximation methods 
(like the Newton-Raphson method) to solve a simple estimating equation, 
where each individual correlation matrix is the identity matrix. The next 
step would be to solve for (3 in the pseudo-likelihood equation, in which all 
the quantities can be calculated from the data. This approach eliminates the 
need to introduce nuisance parameters or to guess at the correlation struc- 
tures, and thus avoids some of the problems associated with these methods 
[see pages 112 and 113 of Fahrmeir and Tutz (1994)]. We note that the as- 
sumptions that we require for this two-step procedure [our conditions (AH), 
(I w ), (C w )] are only slightly more stringent than those of Xie and Yang 
(2003). They reduce to conditions related to the "working independence 
assumption" when the average of the true correlation matrices is asymptot- 
ically nonsingular [our hypothesis (H)]. 

As in Lai, Robbins and Wei (1979), where the linear model is treated, 
we relax the assumption of independence between subjects and consider 
residuals which form a martingale difference sequence. Thus our results are 
more general than results published so far, for example, Xie and Yang (2003) 
for GEE, and Shao (1992) for GLM. 

Since a GEE is not a derivative, most of the technical difficulties surface 
when proving the existence of roots of such general estimating equations. 
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Two distinct methods have been developed to deal with this problem. One 
gives a local solution of the GEE and relies on the classical proof of the 
inverse function theorem [Yuan and Jennrich (1998) and Schiopu-Kratina 
(2003)]. The other method, which uses a result from topology, was first 
brought into this context by Chen, Hu and Ying (1999) and was extensively 
used by Xie and Yang (2003) in their proof of consistency. We adopt this 
second method, which facilitates a comparison of our results to those of Xie 
and Yang (2003) and incorporates the inference results for GLM contained 
in the seminal work of Fahrmeir and Kaufmann (1985). 

This article is organized as follows. Section 2 is dedicated to the existence 
and weak consistency of a sequence of estimators of the main parameter. 
To accommodate the estimation of the average of the correlation matrices 
in the martingale set-up, we require two conditions: (CI) is a boundedness 
condition on the (2 + 5)-moments of the normalized residuals, whereas (C2) is 
a consistency condition on the normalized conditional covariance matrix. In 
this context we use the martingale strong law of large numbers of Kaufmann 
(1987). Section 3 presents the asymptotic normality of our estimators. This 
is obtained under slightly stronger conditions than those of Xie and Yang 
(2003), by applying the classical martingale central limit theorem [see Hall 
and Heyde (1980)]. For ease of exposition, we have placed the more technical 
proofs in the Appendix. 

We introduce first some matrix notation [see Schott (1997)]. If A is a 
p x p matrix, we will denote with ||A|| its spectral norm, with det(A) its 
determinant and with tr(A) its trace. If A is a symmetric matrix, we denote 
by A m i n (A)[A max (A)] its minimum (maximum) eigenvalue. For any matrix 
A, ||A|| = {A max (A T A)} 1//2 . For a p-dimensional vector x, we use the Eu- 
clidean norm ||x|| = (x^x) 1 / 2 =tr(xx r ) 1 / 2 . We let A 1 / 2 be the symmetric 
square root of a positive definite matrix A and A -1 / 2 = (A 1 / 2 ) -1 . Finally, 
we use the matrix notation A < B if X T AX < A T BA for any p-dimensional 
vector A. 

Throughout this article we will assume that the number of longitudinal 
observations on each individual is fixed and equal to m. More precisely, we 
will denote with yj := (ya, . . . , Him)', i <n, a longitudinal data set consisting 
of n respondents, where the components of yj represent measurements at 
different times on subject i. The observations yij are recorded along with 
a corresponding p-dimensional vector Xjj of covariates and the marginal 
expectations and variances are specified in terms of the regression parameter 
(3 through 9ij = x^/3 as follows: 

(2) Hij(l3) := Ep(yij) = /x(%), cj 2 (^) := Yax p (y ij ) = /i(6^), 

where /x is a continuously differentiable link function with jj, > 0, that is, we 
consider only canonical link functions. 

Here are the most commonly used such link functions: 
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1. In the linear regression, (J,(y) = y. 

2. In the log regression for count data, fi(y) = exp(y). 

3. In the logistic regression for binary data, fi(y) = exp(y)/[l + exp(y)]. 

4. In the probit regression for binary data, fi(y) = $(y), where <5 is the stan- 
dard normal distribution function; we have &(y) = (27r)~ 1//2 exp(— y 2 /2). 

In the sequel the unknown parameter (3 lies in an open set B C RP and 
(3q is the true value of this parameter. We normally drop the parameter (3$ 
to avoid cumbersome notation. 

Let m(J3) = (nn (/?),. . . ,/i im (/?)) T , Af(/3) = diag^^), . . . ,af m (P)) and 

1 /2 — 1 /2 — 

Sj(/3) := Cov / g(yj). Note that Sj = A^' RjA 4 , where Rj is the true corre- 
lation matrix of yj at Pq. Let X, = (xji, . . . ,Xj m ) T . 

We consider the sequence £i((3) = {en(P), ■ ■ ■ ,£im(/3)) T with £ij(/3) = yij — 
Hij(f3), and we assume that the residuals (£j)i>i form a martingale difference 
sequence, that is, 

E(e i \J r i - 1 ) = foralH>l, 

where T% is the minimal cr-field with respect to which e±,...,£i are measur- 
able. This is a natural generalization of the case of independent observations. 

Finally, to avoid keeping track of various constants, we agree to denote 
with C a generic constant which does not depend on n, but is different from 
case to case. 



2. Asymptotic existence and consistency. We consider the generalized 
estimating equations (GEE) of Xie and Yang (2003) in the case when the 
"working" correlation matrices are Rj ndep = I for all i. This is also known as 
the "working independence" case, the word "independence" referring to the 
observations on the same individual. Let {j3 n ) n be a sequence of estimators 
such that 

(3) P(gj l nde P(^ n ) = 0)^l and p n ^(3 , 

where g™ dep ((3) = Ya=i x f^(/?) is the "working independence" GEE. 

The following quantities have been used extensively in the work of Xie and 
Yang (2003) and play an important role in the conditions for the existence 
and consistency of j3 n : 

n \ //-p indep 

jjindcp _ y^X^AX' n inde P ■— maXi < n A maxi\ n i ) ) _ ^ 

i=\ mmj< n A m i n (^±tj ) x ) 



^ de P:=mmaxA max ((R; ndcp )- 1 ) 



m, 



( 7 (0))-de P ;= x^H^)" 1 : 



i<n,j<m % ^ 
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We will also use the following maxima: 



: max max 

i<n j<m 



A(%) 



^ 3] (/3) 



max max 

i<n j<m 



^ (3) (%) 



The fact that the residuals {£i)i>\ form a martingale difference sequence 
does not change the proofs of Theorem 2 and Theorem A.l(ii) of Xie and 
Yang (2003). Following their work, we conclude that the sufficient conditions 
for the existence of a sequence {(3 n )n with the desired property (3) are: 

(AH) mdc P for any r > 0, k l n ] '' mdcp = sup_ R i nd c P , , fcj? ((3),l = 2, 3, are bounded 
(i;) indc P A min (Hjf dc P)^oo, 

(C*,) indc P n 1 /2( 7 W)indcp ^ 0> 

where Bjf dc P(r) := {/3; || (H^p) 1 / 2 (/? — /? ) II < m^r}. We denote by (C) indc P 
the set of conditions (AH) inde P, (I£,) indep , (C;) indcp . 

It turns out that, in practice, the analyst will have to verify conditions 
similar to (C) mdep in order to produce the estimators that we propose (see 
Remark 5). All the classical examples corresponding to our link functions 
1-4 are within the scope of our theory. We present below two new examples. 



Example 1. Suppose that p = 2. Let Xjj = (ciij,bij) 

i<n,j<m u ij a ijbij 
jjindep 



v n = J2i<n,j<m^ijbij and w. 



^mO-hanh-j. In this case 



rr 2 a 2 

i^n,j<m ij ij ' 



A max (H if dc P) = K + t; n + d n )/2 and A min (Hjf de P) = (u n + v n - d n )/2, with 
d n := \J (u n — v n )' 2 + 4w 2 . Note that w n = ^Ju n v n cos 9 n for 9 n £ [0,7r] and 
det(H™ dop ) = u n v n sin 2 6 n [see also page 79 of McCullagh and Nelder (1989)]. 
Suppose that 



a 



lim inf sin 

n— >oo 



?n > 0. 



Since 



Amax(H n P) 

U n + V n + d 

n 



\ . /'xrindepN 
A mm\ rl -n ) 



det(Hk ndeP ) UnVnSm" 



one can show that condition (L^) indep is equivalent to mm(u n ,v n ) 
the other hand, 



oo. On 



2m <><Ai 



n + 

UnVn V n 



Condition (C^) mdcp holds if n 1 / 2 maxj< n j<m(u n 1 ^ 2 aij + v^'^b. 



1/2 
U n 



+ 



'13 



1/2 
Vn 



V2i,.A2 
»J > 



0. 



G 



R. M. BALAN AND I. SCHIOPU-KRATINA 



Example 2. The case of a single covariate with p different levels [one- 
way ANOVA; see also Example 3.13 of Shao (1999)] is usually treated 
by identifying each of these levels with one of the p-dimensional vectors 
ei, . . . , e p , where has the kth. component 1 and all the other components 
0. We can say that x^- G {ei, . . . , e p } for all i < n,j < m. In this case, H™ dcp 
is a diagonal matrix. More precisely, 

HjT dep = t 4 k) e k el 

k=l 

where = J2i< n ,j<m;x i:j =e k a ij- Let u n = mm k < p ui k \ Condition (I,^) mdep is 
equivalent to v n — > oo and condition (C^) mdep is equivalent to n 1 ' 2 ^" 1 — > 0. 

The method introduced by Liang and Zeger (1986) and developed recently 
in Xie and Yang (2003) relies heavily on the "working" correlation matrices 
Ri(a) which are chosen arbitrarily by the statistician (possibly containing a 
nuisance parameter a) and are expected to be good approximations of the 
unknown true correlation matrices R^. 

In the present paper, we consider an alternative approach in which at 
each step n, the "working" correlation matrices Rj(a), i <n, are replaced 
by the random matrix 

1 n 

n n :=-Y,MPn)- l/2 e l (p n )e i (p n ) T A i n y 1 / 2 

n f— i 

which depends only on the data set and is shown to be a (possibly biased) 
consistent estimator of the average of the true correlation matrices 

1 n - 
R n := — y Rj. 
n f— f 

The consistency of lZ n is obtained under the following two conditions im- 
posed on the (normalized) residuals y* = with i?(y*y* T ) = R^: 

(CI) there exists a 5 £ (0,2] such that sup^ -E(||y* || 2+<5 ) < oo, 
(C2) i EU Vi $ 0, where V; = S(yJyJ T |^_i) - R 4 . 

Remark 1. Condition (CI) is a bounded moment requirement which 
is usually needed for verifying the conditions of a martingale limit theo- 
rem, while condition (C2) is satisfied if the observations are independent. 
Condition (C2) is in fact a requirement on the (normalized) conditional co- 
variance matrix V n = Ya=i ^(yiyfH^i-i)- More precisely, if the following 
hypothesis holds true: 

(H) there exists a constant C > such that A m ; n (R n ) > C for all n, 
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then condition (C2) is equivalent to R n (V n /n)R n — I — ► [which is 
similar to (3.1) of Hall and Heyde (1980) or (4.2) of Shao (1992)]. Note 
that (H) is implied by the following stronger hypothesis, which is needed in 
Section 3: 

(H') There exists a constant C > such that A m i n (R.;) > C for all i. 
Hypothesis (H') is satisfied if Rj = R for all i, where R is nonsingular. 

The following result is essential for all our developments. 

Theorem 1. Let R n = E(H n ). Under conditions (C) indcp , (CI) and 
(C2), we have 

~ L 1 

lZ n — R n — ► f elementwise ) . 

If the convergence in condition (C2) is almost sure, then lZ n — R n ^4' 
(elementwise). The squig conclusion holds i/R n is replaced by 

_ Proof. Let K n = n-^Li A^e^ef A^ 1/2 and note that E{K n ) = 
R n . Our result will be a consequence of the following two propositions, 
whose proofs are given in Appendix A. □ 

Proposition 1. Under conditions (CI) and (C2), we have 

^ L 1 

TZ n — H n ^0 (elementwise). 

PROPOSITION 2. Under conditions (C) mdep , (CI) and (C2), we have 

lZ n — lZ n ^Q (elementwise). 

In what follows we will assume that the inverse of the (nonnegative defi- 
nite) random matrix lZ n exists with probability 1, for every n. We consider 
the following pseudo-likelihood equation: 

n 

(4) ^D J ( / 9) T V i , n (/3)- 1 £i (/?) = 0, 

i=l 

where D^/?) = A;(/3)Xi and V i>n (P) := A i (/?) 1 / 2 '^. n A i (/?) 1 / 2 . Note that (4) 
can be written as 

n 

i=i 



<s 
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We consider also the estimating function 

gn (/3)=^xfA l (/3) 1 /2 R -i Al ( / g)- 1 /2 ej(/3) . 



i=l 



Note that M n := Cov(g n ) = Ya=i Xf A.pR' 1 R i R ? 7 1 A 4 1/2 X 4 . 

As in Xie and Yang (2003), we introduce the following quantities: 

VY T A 1 / 2 R- 1 A 1 / 2 Y nr ■_ A max(Rn 1 ) 

^min l-tVi 



i=i 



T n • — wA max (R n ), 



7i° } := max max (xgH^Xy), t^tvt^. 
i=l, ...,n j=l,...,m 



Remark 2. A few comments about f n are worth mentioning. First, 
M n < r„H n , where T n : — maxj< n 

Amax(R- n 1 R) < f n - Also, since r^ — r^ — > 
and |r^'| < 1, we can assume that |rj^| < 2, for n large enough (here 

r jki^jk are elements of the matrices R n , resp. R n ). Therefore f n > 1/2. 
The reason why we prefer to work with f n instead of r n will become appar- 
ent in the proof of Proposition 3 (given in Appendix A. 2). Another reason 
is, of course, the fact that f n does not depend on the unknown matrices R. 

Our approach requires a slight modification of the conditions introduced 
by Xie and Yang (2003) to accommodate the use of f n instead of r n . Let 
B n (r) := {13; ||H^ /2 (/3 - < {f n ) l / 2 r}. Our conditions are: 

(AH) for any r > 0, kn = sup„ g , . kn(P), I = 2,3, are bounded, 

(Jw) (T n ) _1 A m i n (H n ) -tOO, 

(C w ) (vr n ) 2 7„ ->• 0, and n^-Knln ->• 0. 

Remark 3. Note that (I w ) implies (I* ) indep , which implies A min (H n ) — > 
00. This follows from the inequalities 

1 T 

xrindcp ^ \ {T> — 1\ Xjindep ^ tt ^ \ fT>— 1\ xrindcp n Tjindcp 

— tl n S A m i n (±t n ) • H n S tin S A max (K n ) t± n — — tl n 

Remark 4. Our conditions depend on the matrix R n , which cannot be 

~ p 

written in a closed form. Since lZ n — R n — > 0, it is desirable to express our 
conditions in terms of the matrix lZ n . In practice, if the sample size is large 
enough, one may choose to verify conditions (AH), (L_u), (C w ) by using lZ n 
(instead of R n ) in the definitions of H n ,7r n ,7 n . 
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Remark 5. If we suppose that hypothesis (H) holds, then for n large 
C 

~2 — -Viin(Rn) < A max (R n ) < 2m. 

In this case (f„)„ and (7r n ) n are bounded, C '( 7 i 0) ')™ dcp < ^ < C( 7 i 0) ) indcp , 
and for every r > there exists r' > such that B n (r) C £? ? " ldcp (r'). There- 
fore, conditions (AH), (I w ), (C w ) are equivalent to (AH) indcp , (I^) indcp , (C^) indcp , 
respectively. In order to verify (H), it is sufficient to check that there exists 
a constant C > such that 

det(7£ n ) > C for all n a.s. 

under the hypothesis of Theorem 1. 
We need to consider the derivatives 

f) (R) - &gM V (0) - dgniP) 

The next theorem is a modified version of Theorem A. 2, respectively, The- 
orem A.l(ii) of Xie and Yang (2003). 

Theorem 2. Under conditions (AH) and (C w ): 

(i) for every r > 

sup HH-^P^^H-^-III^O; 

f3&B n (r) 

(ii) there exists cq > such that for every r > 

P(T>M > c H„ for all G B n (r)) 1. 

Proof, (i) The first two terms produced by the decomposition T> n (0) = 
H n (/3) + B n (/3) +<5 n (/3) are shown to be bounded by vr^7 n , whereas the 
third term is bounded in I? by \frm n ^ n . [Here H n (0),~B n (0),£ n (0) have the 
same expressions as those given in Xie and Yang (2003) with Rj(a), i < re, 
replaced by R„,.] The arguments are essentially the same as those used in 
Lemmas A.l(ii), A.2(ii) and A.3(ii) of Xie and Yang (2003). The fact that we 
are replacing the "working" correlation matrices Rj(a), i = 1, . . . , re, with the 
matrix R n and we assume that (ej)i>i is a martingale difference sequence 
does not influence the proof. Finally we note that (ii) is a consequence of (i). 
□ 



The next two results are intermediate steps that are used in the proof of 
our main result. Their proofs are given in Appendix A. 2. 
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Proposition 3. Suppose that the conditions of Theorem 1 hold. Then 
(f re )-V 2 H^(g n _ gn) ^ . 

Proposition 4. Suppose that the conditions of Theorem 1 hold. Under 
conditions (AH) and (C w ), 

sup \\il^ 2 [V n (P) - VMjH-^W 4 0. 

l3&B n {r) 

The next theoremjs our main result. It shows that under our slightly modi- 
fied conditions (AH), (1^), (C^) and the additional conditions of Theorem 1, 
one can obtain a solution (5 n of the pseudo-likelihood equation g ra (/3) = 0, 
which is also a consistent estimator of Pq. 

Theorem_J3. Suppose that the conditions of Theorem 1 hold. Under 
conditions (AH), (1^) and (C w ), there exists a sequence (f3 n ) n of random 
variables such that 

P(g n 0n)=O)^l and /3 n ^/3 . 

Proof. Let e > be arbitrary and r = r(e) = <J (24p)/(cfe), where c\ is 
a constant to be specified later. We consider the events 

E n :={\^l\ n \< inf HH-^g^.g^lll 

I- !3£dB n (r) ) 

Cl n := {V n (j3) nonsingular, for all (5 E B n (r)}. 

By Lemma A of Chen, Hu and Ying (1999), it follows that on the event 
E n n £l n , there exists (3 n S B n (r) such that g n (Pn) = 0. Therefore, it remains 
to prove that P(E n n fi n ) > 1 — e for n large. 

By Taylor's formula and Lemma 1 of Xie and Yang (2003) we obtain that 
for any j3 £ dB n {r) there exist f3 € B n (r) and a p x 1 vector A, ||A|| = 1 such 
that 

IIH^gn^-gn)!! 

>\X T U-^ACmn 1/2 M-r(fn) 1/2 

>{\x T n-^v n mn 1/2 M 

- iX^WpM - V n (f3)]U^ 2 X\} ■ r{r n fl\ 
By Theorem 2(ii) there exists Co > such that 
P(A T H- 1 /2p re (^) H - 1 /2 A > Co 

(5) 

for all (3 £ B n (r), for all A, ||A|| = 1) > 1 - e/6 
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when n is large. Let c' G (0,co) be arbitrary. By Proposition 4, 
P{\\ T YL-y 2 [f>M -P n (/3)]H-V2 A | < c / 

(6) 

for all /? G B n (r), for all A) > 1 - e/6 
when n is large. Therefore, if W6 pilt C\ '. — Co — Cq , WG hclV6 

(7) p( inf WH^gM - En)\\ > cir(f n ) 1 / 2 ) > 1 - e/3. 

From (5) and (6) we can also conclude that P(O n ) > 1 — e/3 for n large. 

On the other hand, by Chebyshev's inequality and our choice of r, we 
have P(||Hn 1,/2 gn|| < cir(f n ) 1 / 2 /2) > 1 — e/6 for all n. By Proposition 3, 
P(||H~ 1/2 (g n - g n )|| < c 1 r(f n ) 1 /2/ 2 ) > 1 - e/6 for n large. Hence 

(8) P(||H-V2g n || < c^irn) 1 ' 2 ) > 1 - e/3. 

From (7) and (8) we obtain that P{E n ) > 1 — (2e)/3 for n large. This con- 
cludes the proof of the asymptotic existence. 

We proceed now with the proof of the weak consistency. Let 5 > be 
arbitrary. By (l w ) we have f n /A m i n (H n ) < (5/r) 2 for n large. We know 
that on the event E n n Cl n , there exists $ n G B n (r) such that g n (/3 n ) = 0. 
Therefore, on this event 

HA, - Po\\ < \\Hn 1/2 \\ ■ l|Hy 2 (/3n - 0o)\\ < [\ min (Kn)]- 1/2 ■ (f„) 1/2 r < 5 

for n large. This proves that P(\\$ n — P®\\ < 5) > \ — e iov n large. □ 

3. Asymptotic normality. Let c n = A max (M~ 1 H n ). In this section we 
will suppose that (c n f n ) n is bounded. 

Theorem 4. Under the conditions of Theorem 3, 

M- 1 ^ = M;y 2 H n n - Pq) + 0P (1). 

Proof. On the set {g n (/3 n ) =0,/3 n G B n (r)}, we have g n = f> n {(3 n )0 n - 

— ~ — 1/2 

Po) for some /3 n G B n (r) by Taylor's formula. Multiplication with M n 
yields 

M~ 1 /2g n = M- 1 /2 H i/2 An Hy2(^ n _ p o) + M - 1 /2 Hn (/3 n - fa), 

1/2 — — 1/2 

where A n :=H n P n (/3 n )H n — I = op(l), by Theorem 2(i) and Propo- 
sition 4. The result follows since ||M- 1/2 Hy 2 || < c\l 2 and ||Hy 2 (/3 n -/3 )|| < 
(^) 1/2 r. 

□ 



12 R. M. BALAN AND I. SCHIOPU-KRATINA 

Let 7 i D) :=max 1 < i < n A max (H- 1/2 XfA 4 1/2 R- 1 A J 1/2 X l H~ 1/2 ). Note that 

7n < Cd n j n , where d n = maxj< nj < m afj. We consider the following condi- 
tions: 

(N5) there exists a 5 > such that: 

(i) y:=sup 4 > lJ E(||y*|| 2+5 |^_ 1 )<oo a.s.; 

(ii) (c„f„) 1+2 / 5 7 i D) -0, 
(C2)' maxi< n A max (Vj) -> 0. 

Remark 6. Note that condition (N<j)(i), with Y integrable, implies con- 
dition (CI), whereas condition (C2)' is a stronger form of (C2). Part (ii) of 
condition (N5) was introduced by Xie and Yang (2003). 

The following result gives the asymptotic distribution of g n . 

Lemma 1. Suppose that the conditions of Theorem 1 hold. Under con- 
ditions (N5), (C2)' and (H') 

M^/2g n i>iV(0,I). 

PROOF. We note that 

M^/ 2 g n = M^ 2 g n + M- 1 /2(g n _ gn ) 

and ||M~ 1/2 (g n -g„)|| < (c n f n ) 1 /2||(f n )-i/2 H - 1 / 2 (g n -g n )|| 4o, by Propo- 

sition 3. Therefore it is enough to prove that M„ 1/2 g n 4- N(0,I). By the 
Cramer- Wold theorem, this is equivalent to showing that: VA, ||A|| = 1 

n 

(9) X T M- 1 / 2 gn = Y,Z n ,i^N(0,l), 

i=l 

where Z n>i = A r M^ 1/2 Xf A^R" 1 A7 1/2 £;. Note that E{Z n ^ i - 1 ) = for 
all i <n, that is, {Z n y,i < n,n > 1} is a martingale difference array. 

Relationship (9) follows by the martingale central limit theorem with the 
Lindeberg condition [see Corollary 3.1 of Hall and Heyde (1980)] if 

n 

(10) ^E[Zl i l(\Z n , i \>e)\F i - 1 ]^0 a.s. 

i=l 

and 

n 

(11) jx^i^o^i. 

i=i 
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Relationship (10) follows from condition (Ns) exactly as in Lemma 2 of Xie 
and Yang (2003) with ip(t) = t 5 / 2 . Relationship (11) follows from conditions 
(C2)' and (H'): 

n 
i=l 

n 

= J2[E(Zl i \F i -i)-E(Zl i )] 
i=l 
n 

i=i 

< max A max (H) • max A max (R~ 1 ) • A T M" 1 / 2 M n M- 1 / 2 A 

l<i<n l<i<n 

^C-VaxA^^-^O. □ 

Putting together the results in Theorem 4 and Lemma 1, we obtain the 
asymptotic normality of the estimator (3 n . 

Theorem 5. Under the conditions of Theorem 3 and conditions (Ns), 
(C2)' and (HO, 

M^/ 2 H n n -/3 o )^iV(O,I). 

Remark 7. In applications we would need a version of Theorem 5 where 
M n is replaced by a consistent estimator. We suggest the estimator proposed 
by Liang and Zeger (1986) [see also Remark 8 of Xie and Yang (2003)]. The 
details of the proof are omitted. 

APPENDIX 

A.l. The following lemma is a consequence of Kaufmann's (1987) mar- 
tingale strong law of large numbers and can be viewed as a stronger version 
of Theorem 2.19 of Hall and Heyde (1980). 

Lemma A.l. Let (xj)i>i be a sequence of random variables and let 
{3~i)i>\ be a sequence of increasing a -fields such that Xi is ^-measurable 
for every i> 1. Suppose that sup i E\xi\ a < 00 for some a G (1,2]. Then 

1 n 

— y^(xj — E(xA J-'i-i)) — > a.s. and in L a . 

n r— f 

i=i 
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Proof. Note that yi = X{ — E(xi\J-i-\), n > 1, is a martingale difference 
sequence. By the conditional Jensen inequality 

\Vi\ a < 2 a ~ 1 {\x l \ a + \E{ Xi \ri^T} < 2"" 1 {l^r + E{\ Xi \ a \F^ x )} 
and sup i>:L E\yi\ a < 2 a sup i>1 E\xi\ a < oo. Hence 

i>\ % ^ i>i % 

The lemma follows by Theorem 2 of Kaufmann (1987) with p = 1, B{ = i . 
□ 

Proof of Proposition 1 . We denote by rj^ , , (j, k = 1 , . . . , m) 
the elements of the matrices TZ n , R n , V n , respectively. We write 



i 1 

(12) $ - = i - + ^ E<$- 

i=l i=l 



The first term converges to zero almost surely and in L 1+<5 / 2 by applying 
Lemma A.l with Xj = y*jy* k , and using condition (CI). The second term 
converges to zero in probability by condition (C2). This convergence is also 
in L 1+< V 2 because the sequence {re -1 Ya=i v jl }n has uniformly bounded mo- 
ments of order 1 + 5/2 and hence is uniformly integrable. □ 

Proof of Proposition 2. We denote by f^} (j,k = l,...,m) the el- 
ements of the matrix TZ n . Let 5ij k ■= [<?ijVik]/[<7ij(Pn)vik(f3n)] - 1, A/^- := 
/J-ij(Pn) ~ fHj(Po) and 

A{Eijeik) := Sij{Pn)^ik{Pn) ~ ^ij^ik = (A/ijjXA/ijfc) ~ (A//jj)£ifc — {^^ik)^ij- 

With this notation, we have 

~(n) 1 \ ^ &ij (Pn'jE-ikifin) 1 E-ij^ik 



r jk r jk 



n ^ (Jijifin^ikifin) n h\ 

1 A(ejj£jfc) 1 ^-^ A(ejj£:j/ C ) ~ 1 ^--v EijEik j 
— > 1 — > Oi jk H — > Oj 



From here, we conclude that 



{1 n 
U n,jk + -J2\ytjyik\ 
it/. _. 
1=1 
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where 

Un, jk 



1 \ AfHj\ • |A/^fc| + I ^ I A ^ 



i=i 



&ij&ik 



1*1, 1 V" \ A ^ik\ 



_ c/ [i] , tjM +u m 

— U n,jk ^ U n,jk ^ U n,jk' 

Recall that our estimator (3 n was obtained in the proof of Theorem 2 of 
Xie and Yang (2003) as a solution of the GEE in the case when all the 
"working" correlation matrices are R™ dep = j One of the consequences of 
the result of Xie and Yang is that for every fixed e > 0, there exist r = r £ 
and N = N e such that, if we denote O n , £ = lies in B™ dc P(r)}, then 

P(^n,e) > 1 - e for all n > N. 

We define (5 n to be equal to (3q on the event r^ e . Therefore, 

on e : max 1 5j ? fc | = and A/ijj = 0. 

Using Taylor's formula and condition (AH) mdep , we can conclude that on 
the event £l n E , there exists a constant C = C e such that 



i 



< C • (7°) indcp • (m^r) for all i < n, 



1 A (A^ 

„ ^2 



.i=l 



2 

a, 2 



< n- 1 ^ - /3 ) T <| £Xf Aj^A^JA^A}/ 2 ^ \0 n - ft) 



a fj X >.i X !j \0n-Po) 



ll2 A l/2, 



.i=l 



< n- 1 maxAL x [A i (^)A- 1 ] • || (H^) 1 ^ _ 0O 



i<n 



< Cn- l (m 2 r). 



Note also that E[n~ l Ei=i{yW' 2 } = 8$?] = 0{l) since fV 



s(") 



and 



1. Applying the Cauchy-Schwarz inequality to each of the three sums 



that form U n: j k , we can conclude that 

E[U [ ^ k }^0 and E[(U® - k f] - 0, 1 = 2,3. 
On the other hand, 



E 



max\8 ijk \ ■ U n jk 



max|&, fe | • U n jk dP 

o i<n J J 
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<c(^r dep [ e^-o, 



E 



\X I 1 

— 2=1 



Uikl 



□ 



A.2. 



Proof of Proposition 3. Let fcjjj* = [ct 2 ,/^] 1 / 2 , ft" 1 := Q n = (qf k } ) j)h=1> 



and R n 1 := Q n = (qj k )j ! k=l,...,m- With this notation, we write 



j,fc=l l i=l J 

(n) (n) P 

By Theorem 1, ^. fc — q- k — > for every j, k. The result will follow once we 
prove that {(fn)" 1 / 2 !!^ 1 ^ 2 Yh=i h\f k Xije ik } n is bounded in I? for every j, k. 
Since (ea-)j>i is a martingale difference sequence, we have 



E 



i=i 

= (fO- 1 tr|H;V^(^))^.^jH; 1 / 2 | 

= (f n )-Hr|H-V 2 ^4x ii x5^H-V 2 | 

< (f n )~ 1 (4mf n )tr(I)=4mp 
because E?=i ^x.x^ < E?=i Xf A;X, < A max (R n )H„ < 4mf n H n . □ 

Proof of Proposition 4. We write 

V n ((3) =H n ((3) +B n {[3) + £ n ([3), V n (p)=H n (P)+B n (P) + £ n (P), 

where Ti. n ((3),B n (f3),£ n (/3) have the same expressions as H n ((3), B n (/3),£ n (/3), 
with R n replaced by 7£ n . Our result will follow by the following three lem- 
mas. 

□ 
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Lemma A. 2. Suppose that condition (AH) holds. If (vr n 7 n ) n is bounded, 
then for any r > and for any p x 1 vector A with ||A|| = 1, 

sup |A T H^ 1 / 2 [W n C9) - H^H-^AI 4 0. 

!3£B n (r) 

Lemma A. 3. Suppose that condition (AH) holds. If {^nlri)n is bounded, 
then for any r > and for any p x 1 vector A urei/t ||A|| = 1, 

sup |A T H- 1 /2[B n (^) _ B n (/?)]H~ 1//2 A| 4 0. 

/3GB„(r) 

Lemma A. 4. Suppose that condition (AH) holds. If {n l l 2 ^ n ) n is bounded, 
then for any r > and /or any p X 1 vector A u>ii/i ||A|| = 1, 

sup |A T H- 1 /2[4( /3) _ ^(^JH-^AI 4 0. 

P€B n (r) 

PROOF of Lemma A. 2. Using Theorem 1 and the fact that \r^\ < 2 

for n large, we have An = R^^R^ 2 - I = Rn /2 (^" 1 - R~ 1 )Rn / ' 2 4 
(elementwise) . For every /3, 

= £ A r H^/2 x f A 4 (/3) 1 /2 R -i/^ nR -i/2 Al (/3) V^H" 1 ^ 

i=l 

< max{|A max (A„)|, |A min (A)|} ■ {A T H" 1 / 2 H n (/?)H^ 1 / 2 A}. 
The result follows, since one can show that for every (3 G B n (r) 

|A T H- 1 / 2 H 7 ,(/3)H- 1 /2 A _ i| 
(13) < A^H-V^rW ( /?)H- V2 A + 2|A t H~ 1 / 2 h| 2 J (/3)H" V* A | 

<C^7n + 2C(7T n 7 n ) 1 / 2 <C, 

where 

n 

HW(/3) = £xf (A! /2 (/3) - A^R^A^) - A^X,, 

i=l 



Hk 2 l(/3) = E X HAj /2 (/3) - A^R-^X., 



i=l 
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1 /2/m A -l/2_ 



We used the fact that sup^- , , maxj<„ A max {(A t 1/z (/?)A- 1/z - 1) 2 } < C%, 

which follows by condition (AH) as in Lemma B.l(ii) of Xie and Yang (2003). 
□ 

l/2 Y Tp[l], 



Proof of Lemma A. 3. Let w itfl (p) T = A T H„ i/z Xf G^ 1J (/3) x diag{XiH„ 1/2 A}R 



-1/2 



and Zi , n (/3) = R^ 1/2 Ai(/3)- x /^(^ - ^ (/?)). We have 

n 



i=l 



1/2 



^ ] 1 1 25 j j7l 



1/2 



.i=l 



.i=l 



by using the Cauchy-Schwarz inequality. Methods similar to those developed 
in the proof of Lemma A. 2(h) of Xie and Yang (2003) show that for any /3 £ 

B n (r), E?=i l|w iiri (/3)|| 2 < Cvr n7 i 0) and E?=i W^nWW < Cir n f n \ max (Hn 1/2 Kn(P) x 

— 1/2 

H n ) < Cir n f n [using (13) for the last inequality]. Hence 

sup |A T H^ 1 / 2 [4 1 ](/3) - BW(^)]H- 1 / 2 A| < CWMMln) 1 ' 2 ^ o. 

Let v iin (/3) T = A T H" 1/2 Xf A 4 (/3)V2 R -V2 AR -i/2 diag{XtH ~i/2 A}G [2] (/3) x 
A i (/3) 1 /2 R V2 > We haye 

\x T u-^mP)-B^mnn 1/2 M 



Evi,»(^) T Z,,n(/3) 



i=l 



< 



J2 H v v 



1/2 



.i=l 



E 

.i=l 



1/2 



Z; ■ 



□ 



One can prove that for any G 5 n (r), ££=1 ll v i,n(/?)|| 2 < C7r n j^ \ max (Al) x 
{A T H~ 1/2 H n (/3)H~ 1/2 A} < C7r n7 i 0) p n || 2 [using (13) for the last inequal- 
ity]. Hence 

sup \X T n-^[B^((3) -B^H-^AI <C||A|kn(7n) 1/2 - 0. 

/9£S„(r-) 

Proof of Lemma A. 4. We write £ n {(3) - £ n {(3) = [£ [ n\fi) - £n ] ((3)} + 

— [21 [21 

[£i J (/?)-£r(/?)] and we use a decomposition which is similar to that given 
in the proof of Lemma A. 3(h) of Xie and Yang (2003). More precisely, we 
write 

A^H-V 2 ^) -^(/3)]H- 1 / 2 A = 4 1 ] +T®((3) +T^((3), 
X T H-^mP) -^(/3)]H-V 2 A = 7l 2 ] +TW(/3) +T^((3\ 
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where T™(0) = ££U(g<? - qf k ] ) ■ S^ k ((3) for 1 = 1, ... ,6 and 

sS Jk = A r H-V2 E[A -V2 G [i]] .[diaglX.H- 1 /^}],^^^,,, 
i=i 

4 3 k-(/3) = A T H-V^ [A r 1 / 2 Gi 1 k/3)] J [diag{X i H-^A}],[A i (/3)-^A l 1 / 2 - J] fc 

8=1 

1,(0) 
X n ijk x ij £ ik, 

n 

S^) = ^ElV^Gf 1 ^ - Gf^.tdiagiX.H-V^ll^^x^^, 

i=l 
n 

<k = A T H-V^[ diag{ x i H-V 2A }] fc [G! 2 ]A J 1 / 2 ] fc ^x l , ejfc , 
i=i 

= A T H~ 1//2 ^[A^ 1 / 2 A^/?) 1 / 2 - 2] i [diag{X 4 H; 1 / a A}]*[Gpi03)A}/ a ] fc 

i=l 
n 

= A T H- 1 /^[di a g{X,H- 1 / 2A }] fc [(G! 2 ](/J) - Gf)A} VgU** 
i=i 

(here we have denoted with [A] j the jth element on the diagonal of a matrix 
A). 

Since - q^) 0, it is enough to prove that {S^j jk } n , {S^ jk } n and 

{ snp i3£B n (r) \ S n,jk(P)\}n, i = 3,4,5,6, are bounded in 1? for every j,k = 
l,...,m. 
We have 

^(l^l 2 ) 

< CT^trjH- 1 ^/ ^^x^H-V^j < C^(Ampf n ) = C% < C. 

By the Cauchy-Schwarz inequality, for every f3 S B n (r), 

* {E[Ar 1/2 G! 11 (/?)] 2 [dia g {X. t H-^ A}] 2 [Aj( ^ ) -i/ 2A i/ 2 _ jjlj 
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x {E(^i) 2 4(A r H-V 2xjj ) 2 | 

Hence ^(sup^^j \S [ ^ k {P)\ 2 ) < Cn^% • {A r H" 1/2 (Er=i ojx^xg)!^ 1 / 2 x 
A} < Cn 7 ° 7n • (4mf n ) < Cn(%) 2 < C. 

Similarly, by the Cauchy-Schwarz inequality, for every (3 6 B n (r), 

\SI} 3 M\ 2 ± {E[A.r 1/2 (G! 1] (/3) - G?1)]J [diag{X,H;V2 A}] 2| 

x {E(^) 2 4(a t h^/ 2x ..)2| 

and S(su P/36 ~ n(r) |5^. fc C9)| 2 ) < Cn( 7n ) 2 < C. 

The terms S l ^ k {J3), 1 = 2,4,6, can be treated by similar methods. □ 

REFERENCES 

Chen, K., Hu, I. and Ying, Z. (1999). Strong consistency of maximum quasi-likelihood 

estimators in generalized linear models with fixed and adaptive designs. Ann. Statist. 

27 1155-1163. MR1740117 
Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the 

maximum likelihood estimator in generalized linear models. Ann. Statist. 13 342-368. 

MR773172 

Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modelling Based on Gener- 
alized Linear Models. Springer, New York. MR1284203 

Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Aca- 
demic Press, New York. MR624435 

Kaufmann, H. (1987). On the strong law of large numbers for multivariate martingales. 
Stochastic Process. Appl. 26 73-85. MR917247 

Lai, T. L., Robbins, H. and Wei, C. Z. (1979). Strong consistency of least squares 
estimates in multiple regression. II. J. Multivariate Anal. 9 343-361. MR548786 

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized 
linear models. Biometrika 73 13-22. MR836430 

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman 
and Hall, London. MR727836 

Rao, J. N. K. (1998). Marginal models for repeated observations: Inference with sur- 
vey data. In Proc. Section on Survey Research Methods 76-82. Amer. Statist. Assoc., 
Alexandria, VA. 



ESTIMATION AND LONGITUDINAL DATA 



21 



Schiopu-Kratina, I. (2003). Asymptotic results for generalized estimating equations with 
data from complex surveys. Rev. Roumaine Math. Pures Appl. 48 327-342. MR2038208 
Schott, J. R. (1997). Matrix Analysis for Statistics. Wiley, New York. MR1421574 
Shao, J. (1992). Asymptotic theory in generalized linear models with nuisance scale pa- 
rameters. Probab. Theory Related Fields 91 25-41. MR1142760 
Shao, J. (1999). Mathematical Statistics. Springer, New York. MR1670883 
Xie, M. and Yang, Y. (2003). Asymptotics for generalized estimating equations with 

large cluster sizes. Ann. Statist. 31 310-347. MR1962509 
Yuan, K.-H. and Jennpjch, R. I. (1998). Asymptotics of estimating equations under 
natural conditions. J. Multivariate Anal. 65 245-260. MR1625893 



Department of Mathematics 

and Statistics 
University of Ottawa 
Ottawa, Ontario 
Canada KIN 6N5 
E-MAIL : rbala348@science. uottawa. ca 



Statistics Canada 
HSMD 16RHC 
Ottawa, Ontario 
Canada K1A 0T6 

E-MAIL: ioana.schiopu-kratina@statcan.ca 



